Acoustic modeling and language modeling for cantonese LVCSR
نویسندگان
چکیده
This paper describes our recent work on the development of a large-vocabulary, speaker-independent continuous speech recognition system for Cantonese (a major Chinese dialect). Both acoustic modeling and language modeling are being addressed. For acoustic modeling, we focus on right-context-dependent sub-syllable units. Tying of HMM at model as well as state level is applied based on phonetic knowledge and the decision-tree approach. Statistical language model is built from large amount of newspaper text. The overall recognition accuracy for syllable and Chinese character are 81.83% and 68.94% respectively.
منابع مشابه
Language modeling for speech recognition of spoken Cantonese
This paper addresses the problem of language modeling for LVCSR of Cantonese spoken in daily communication. As a spoken dialect, Cantonese is not used in written documents and published materials. Thus it is difficult to collect sufficient amount of written Cantonese text data for the training of statistical language models. We propose to solve this problem by translating standard Chinese text,...
متن کاملSub-syllable Acoustic Modeling for Cantonese Speech Recognition
This paper presents a pioneer study on acoustic modeling for continuous Cantonese speech recognition. It starts from the context-independent modeling of sub-syllabic units, namely INITIALs and FINALs, and then moves on to examine a number of context-dependent models that characterize intra-syllable co-articulation. The acoustic models are trained with a large database of Cantonese polysyllabic ...
متن کاملFirst Broadcast News Transcription System for Khmer Language
In this paper we present an overview on the development of a large vocabulary continuous speech recognition (LVCSR) system for Khmer, the official language of Cambodia, spoken by more than 15 million people. As an under-resourced language, develop a LVCSR system for Khmer is a challenging task. We describe our methodologies for quick language data collection and processing for language modeling...
متن کاملAutomatic speech recognition of Cantones
This paper describes our recent work on the development of a largevocabulary, speaker-independent, continuous speech recognition system for Cantonese-English code-mixing utterances. The details of both acoustic modeling and language modeling will be discussed. For acoustic modeling, Cantonese accents in English words are handled by applying cross-lingual acoustic units, as well as modifications...
متن کاملModeling Cantonese Pronunciation Variations for Large-Vocabulary Continuous Speech Recognition
This paper presents different methods of handling pronunciation variations in Cantonese large-vocabulary continuous speech recognition. In an LVCSR system, three knowledge sources are involved: a pronunciation lexicon, acoustic models and language models. In addition, a decoding algorithm is used to search for the most likely word sequence. Pronunciation variation can be handled by explicitly m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999